A Bayesian Approach to Graphical Record Linkage A Simulation Study
نویسنده
چکیده
We provide a simulation study based on the model in §2.1 and we simulate data from the NLTCS based on our model, with varying levels of distortion. The varying levels of distortion (0, 0.25%, 0.5%, 1%, 2%, 5%) associated with the simulated data are then run using our MCMC algorithm to assess how well we can match under “noisy data.” Figure 3 illustrates an approximate linear relationship with FNR and the distortion level, while we see an near-exponential relationship between FPR and the distortion level. Figure 4 demonstrates that for moderate distortion levels (per field), we can estimate the true number of observed individuals extremely well via estimated posterior densities. However, once the distortion is too noisy, our model has trouble recovering this value.
منابع مشابه
SMERED: A Bayesian Approach to Graphical Record Linkage and De-duplication
We propose a novel unsupervised approach for linking records across arbitrarily many files, while simultaneously detecting duplicate records within files. Our key innovation is to represent the pattern of links between records as a bipartite graph, in which records are directly linked to latent true individuals, and only indirectly linked to other records. This flexible new representation of th...
متن کاملComparison of Estimates Using Record Statistics from Lomax Model: Bayesian and Non Bayesian Approaches
This paper address the problem of Bayesian estimation of the parameters, reliability and hazard function in the context of record statistics values from the two-parameter Lomax distribution. The ML and the Bayes estimates based on records are derived for the two unknown parameters and the survival time parameters, reliability and hazard functions. The Bayes estimates are obtained based on conju...
متن کاملComparison of two QTL mapping approaches based on Bayesian inference using high-dense SNPs markers
To compare different QTL mapping methods, a population with genotypic and phenotypic data was simulated. In Bayesian approach, all information of markers can be used along with combination of distributions of SNP markers. It is assumed that most of the markers (95%) have minor effects and a few numbers of markers (5%) exert major effects. The simulated population included a basic population of ...
متن کاملPerformance Bounds for Graphical Record Linkage
Record linkage involves merging records in large, noisy databases to remove duplicate entities. It has become an important area because of its widespread occurrence in bibliometrics, public health, official statistics production, political science, and beyond. Traditional linkage methods directly linking records to one another are computationally infeasible as the number of records grows. As a ...
متن کاملHierarchical Bayesian Record Linkage Theory
In record linkage, or exact file matching, one compares two or more files on a single population for purposes of unduplication or production of an enhanced, merged database. Record linkage has many applications, including in population enumeration efforts, to create databases for epidemiological investigations, and to improve survey sample frames. Latent class and mixture models have been used ...
متن کامل